Advertisement effectiveness measurement

ABSTRACT

A dashboard to integrate gadgets and present data output from the gadgets in an integrated user interface. The gadgets dynamically collect information about an advertisement or an ad campaign associated with the advertisement as the information is collected from various sources during the ad campaign, each of some of the gadgets processing collected information and outputting the processed information in real time.

TECHNICAL FIELD

This document generally relates to information management.

BACKGROUND

The Internet provides access to a wide variety of content items, e.g.,video and audio files, web pages, and news articles. Such access to thecontent items has enabled opportunities for targeted advertising. Forexample, content items can be identified to a user by a search engine inresponse to a query submitted by the user. The query can include one ormore search terms, and the search engine can identify and, optionally,rank the content items based on the search terms in the query andpresent the content items to the user (e.g., according to the rank). Thequery can also be an indicator of the type of information of interest tothe user. By comparing the user query to a list of keywords specified byan advertiser, it is possible to provide targeted advertisements to theuser.

Another form of online advertising is advertisement syndication, whichallows advertisers to extend their marketing reach by distributingadvertisements to additional partners. For example, third party onlinepublishers can place an advertiser's text or image advertisements on webpages that have content related to the advertisement. As the users arelikely interested in the particular content on the publisher webpage,they are also likely to be interested in the product or service featuredin the advertisement. Accordingly, such targeted advertisement placementcan help drive online customers to the advertiser's website.

The serving of the advertisements can be improved by evaluating theeffectiveness of the advertisements. One technique for evaluating theeffectiveness of an advertisement is to survey an audience foradvertisement recognition and brand linkage after an advertisingcampaign has run. The measure of advertisement recognition can, forexample, be based on the percentage of a survey audience that recognizesthe advertisement, and the measure of brand linkage can, for example, bebased on the percentage of the survey audience that correctly identifiesthe featured product and/or brand of the advertisement. An advertisementcan be brand obfuscated, i.e., branding and/or product information canbe removed from the advertisement, and an audience can be surveyed tomeasure the brand linkage and advertisement recognition. Post-campaignad effectiveness studies may show, for example, whether online userbehavior as manifested by web site visitations and search activity haveincreased due to the display of ads. The experience gained in one adcampaign may be used in designing future ad campaigns.

SUMMARY

This document describes a system that provides a dashboard to integratevarious gadgets and present the data output from the gadgets in anintegrated user interface. The gadgets dynamically collect informationabout one or more advertisements, or one or more ad campaigns associatedwith the one or more advertisements in real time from various sourcesduring the ad campaigns. Each of some of the gadgets processes thecollected information and outputs the processed information in real timeto allow an advertiser to evaluate the performance of an advertisementor ad campaign in real time.

In some examples, some of the gadgets may collect information related tothe timings of both a content presentation on a web page and a web pageaccess by a user, and uses the collected information to determine if theuser accessed the web page while the content was presented on the webpage. For example, a user's device (e.g., a web browser on a personalcomputer or mobile phone) may obtain non-ad content (e.g., the main textof a web page) from one location and advertisements (which can include,e.g., images or text) from another location and displays both of them tothe user at the same time. In some examples, a television can obtainadvertisements (which can include, e.g., images or videos) to showalongside the main television programs that the viewer is watching. Insome examples, a radio can obtain advertisements (which can include,e.g., audio recordings) to play along the main radio programs that thelistener is listening to. The ads and the main content can come fromdifferent sources, and there may be multiple log files. The gadgets cancorrelate the records from the logs to determine which ads were shown tothe user while the user was accessing certain web pages, watchingcertain television programs, or listening to certain radio programs.

In general, in another aspect, a plurality of gadgets dynamicallycollect information about at least one of an advertisement or an adcampaign associated with the advertisement as the information isreceived from a plurality of sources during the ad campaign, each ofsome of the gadgets processing collected information and outputting avisual representation of the processed information as the information isreceived; and a dashboard integrates the gadgets and presents the dataoutput from the gadgets in an integrated user interface.

Implementations may include one or more of the following features. Twoor more of the gadgets can provide information on effectiveness ofcorresponding two or more ad campaigns. The dashboard can show adcreatives of the two or more ad campaigns sorted according to theeffectiveness of the ad campaigns. At least one of the gadgets canprocess private data at a client site and implement a security procedureto prevent unauthorized access to the private data. At least one of thegadgets executing at a client site can process data that is private to athird party, and the gadget can implement a security procedure toprevent unauthorized access to the private third party data. At leastone of the gadgets can process data indicating a first time related towhen a web page was accessed on a device and a second time related towhen an advertisement was displayed on the accessed web page todetermine if the device accessed the web site while the advertisementwas displayed. At least one of the gadgets can determine on-linebehaviors of users whose devices have accessed the web site while thecontent item was displayed. The information can include at least one ofdata indicating a performance of the advertisement, statistical dataassociated with the advertisement, or data indicating recognition of abrand associated with the advertisement. The dashboard can provideinformation on at least one of brand health, campaign effectiveness,competitive brand tracking, market research, offline ad effectiveness,or mix media recommendation.

At least two gadgets can communicate with each other such that a gadgetis updated automatically in response to a change in another gadget. Afirst gadget can process raw data to generate first ad performance data,a second gadget can process the first ad performance data to generatesecond ad performance data, and when the first gadget updates the firstad performance data, the first gadget can push the updated first adperformance data to the second gadget to enable the second gadget toupdate the second ad performance data. The gadgets can include a firstgadget that processes data output from a second gadget and a thirdgadget to generate combined data for output. The first gadget cancorrelate the output from the second gadget with the output from thethird gadget to identify a correlation between the outputs from thesecond and third gadgets. Each of some of the gadgets can include aninteractive user interface to allow a user to perform at least one ofselecting information related to different ads, selecting informationrelated to different brands, or selecting statistical information for anad for different periods of time.

One or more application programming interfaces (APIs) can be provided toenable exchange of data among the gadgets. One or more applicationprogramming interfaces can be provided to enable export of data from thegadgets or import of data to the gadgets. At least one of the gadgetscan be configurable to enable selective view of a portion of the dataoutput from the gadget. The dashboard can present the output fromvarious gadgets in a web page. The integrated user interface can displayat least one of text messages, charts, or graphs. Gadgets can provideinformation that is not associated with the advertisement or adcampaign. The integrated user interface can provide at least one ofcalendar, time, search trend, or news information. Applicationprogramming interfaces can be provided to enable the gadgets that areassociated with the advertisement or ad campaign to communicate withgadgets that are not associated with the advertisement or ad campaign.

In general, in another aspect, at a computer, a plurality of gadgetsdynamically collect information about at least one of an advertisementor an ad campaign associated with the advertisement as the informationis received from a plurality of sources during the ad campaign; for eachof some of the gadgets, the collected information is processed and avisual representation of the processed information is output; adashboard presents the data output from the gadgets in an integrateduser interface; and the data being presented are dynamically updated asthe information is received during the ad campaign.

Implementations may include one or more of the following features. Twoor more of the gadgets can provide information on effectiveness ofcorresponding two or more ad campaigns. Ad creatives of the two or moread campaigns can be shown and sorted according to the effectiveness ofthe ad campaigns. Cross-gadget communication can be enabled in which afirst gadget updates information output from the first gadget and sendsa signal to a second gadget to cause the second gadget to updateinformation output from the second gadget. One or more applicationprogramming interfaces can be provided to enable exchange of data amongthe gadgets. One or more application programming interfaces are providedto enable export of data from the gadgets or import of data to thegadgets.

In general, in another aspect, a web interface enables uploading ofgadgets to an on-line gadget marketplace and downloading of one or moreof the gadgets from the on-line gadget marketplace, each of some of thegadgets configured to dynamically collect and process information aboutat least one of an advertisement or an ad campaign associated with theadvertisement as the information is received during the ad campaign; aset of application programming interfaces enables data to be imported tothe gadgets or exported from the gadgets, or to enable cross-gadgetcommunication among the gadgets; and a storage stores the uploadedgadgets.

Implementations may include one or more of the following features.Gadget templates or components can be used to build gadgets. A securitymodule controls access to one or more of the gadgets.

In general, in another aspect, a web interface is provided to enableuploading of gadgets to an on-line gadget marketplace and downloading ofthe gadgets from the on-line gadget marketplace, each of some of thegadgets configured to dynamically collect information about at least oneof an advertisement or an ad campaign associated with the advertisementas the information is received during the ad campaign; uploaded gadgetsare stored in a storage; a set of application programming interfaces(APIs) is provided to enable data to be imported to the gadgets orexported from the gadgets; and a set of APIs is provided to enablecross-gadget communication among the gadgets.

In general, in another aspect, an apparatus includes gadgets thatdynamically collect information about an advertisement or an ad campaignassociated with the advertisement in real time from various sourcesduring the ad campaign, each of at least some of the gadgets processingcollected information and outputting the processed information in realtime; and means for integrating the gadgets and presenting the dataoutput from the gadgets in an integrated user interface.

In general, in another aspect, data comprising a time point related towhen a web page is accessed on a device and another time point relatedto when a content item is displayed on the accessed web page areobtained; an interval between the two time points is calculated; and adetermination is made as to whether the device accessed the web sitewhile the content was displayed on the web site based on a comparison ofthe interval to at least one predetermined threshold.

Implementations may include one or more of the following features. Thedevice can include a computer or a cell phone. The data can include aninternet connection speed of the device and a type of web browser usedby the device. The content can include an advertisement. The determiningcan be based on whether the first time is before or after the secondtime. The at least one predetermined threshold can be determined byexecuting instructions on a computer, including calculating, for each ofa plurality of devices, an interval between a first time related to whenthe device accessed a web page and a second time related to when acontent was displayed on the accessed web page; calculating for eachinterval a probability that the interval is a member of one of twogroups that are each characterized by different statistics; anddetermining at least one predetermined threshold that classifies eachinterval into one of the two groups and reduces misclassifications. Insome examples, the at least one predetermined threshold includes anupper threshold and a lower threshold. If the interval is less than thelower threshold, then there is a match. If the interval is above theupper threshold, then there is no match. If the interval is between thelower threshold and the upper threshold, then there is a high likelihoodof misclassification, so the match is classified as uncertain and theinterval is not used. The lower threshold is chosen to reduce theprobability of identifying a match when there is no match. The upperthreshold is chosen to reduce the probability of identifying a non-matchwhen there really is a match. On-line behaviors of users whose deviceshave accessed the web site while the content item was displayed can bedetermined. A report of the on-line behaviors can be generated. On-linebehaviors of users whose devices have not accessed the web site whilethe content item was displayed can be determined. A report of theon-line behaviors can be generated.

In general, in another aspect, for each of a plurality of devices, datacomprising a first time related to when a device accessed a web page anda second time related to when a content item was displayed on theaccessed web page are obtained; an interval between the first time andthe second time associated with each of the plurality of devices iscalculated; and a range of intervals for which the device is more likelythan not to have accessed the web page while the content was displayedon the web page is determined.

Implementations may include one or more of the following features. Theplurality of devices can include at least one of computers or cellphones. The data can include an internet connection speed of some of theplurality of devices and a type of web browser used by a group of theplurality devices. The content item can include an advertisement. Therange of intervals can be relative to a measurement of a first timerelated to when a web page is accessed on a device. The range ofintervals can be determined by calculating for each interval aprobability that the interval is a member of one of two groups that areeach characterized by different statistics; and a threshold thatclassifies each interval into one of the two groups and reducesmisclassifications can be determined, in which values below thethreshold are within the range of intervals. One group can becharacterized by statistics of a uniform distribution, a lognormaldistribution, or a gamma distribution.

In general, in another aspect, a collector obtains data including afirst time related to when a web page is accessed on a device and asecond time related to when a content item is displayed on the accessedweb page; and an analyzer calculates an interval between the first timeand the second time, and determines if the device accessed the web sitewhile the content was displayed on the web site based on a comparison ofthe interval to a predetermined threshold.

Implementations may include one or more of the following features. Thedevice can include a computer or a cell phone. The data can include aninternet connection speed of the device and a type of web browser usedby the device. The content item can include an advertisement. Theanalyzer can determine if the device accessed the web site while thecontent was displayed on the web site also based on whether the firsttime is before or after the second time. A server provides a pluralityof gadgets and a dashboard, at least one of the gadgets receiving andprocessing data from the analyzer and outputting a visual representationof the processed data as the data is received from the analyzer, thedashboard presenting the data output from the plurality of gadgets in anintegrated user interface at a client machine. At least one of thegadgets processes private data at the client machine and implements asecurity procedure to prevent unauthorized access to the private data.

In general, in another aspect, a collector obtains, for each of aplurality of devices, data including a first time related to when adevice accessed a web page and a second time related to when a contentitem was displayed on the accessed web page; and an analyzer calculatesan interval between the first time and the second time associated witheach of the plurality of devices, and determines a range of intervalsfor which the device is presumed to have accessed the web page while thecontent was displayed on the web page.

Implementations may include one or more of the following features. Theplurality of devices can include at least one of computers or cellphones. The data can include an Internet connection speed of some of theplurality of devices and a type of web browser used by a group of theplurality devices. The content can include an advertisement. The rangeof intervals can be relative to a new measurement of the first time. Theanalyzer can determine the range of intervals by calculating for eachinterval a probability that the interval is a member of one of twogroups that are each characterized by different statistics; anddetermining a threshold that classifies each interval into one of thetwo groups and reduces misclassifications, in which values below thethreshold are within the range of intervals. One group can becharacterized by statistics of a uniform distribution, a lognormaldistribution, or a gamma distribution.

These and other aspects and features, and combinations of them, may beexpressed as methods, apparatus, systems, means for performingfunctions, program products, and in other ways.

Advantages of the aspects and features include none, one, or more of thefollowing. Advertisers can more easily access data that are useful forevaluating performance of ads. Useful data from various sources can begathered and presented to advertisers in integrated user interfaces. Thedashboards can be easily customized to satisfy the needs of theadvertisers. More ad revenue can be generated for advertisers andpublishers by understanding how users' behaviors may change afterviewing a content item (e.g., an ad). Ad campaigns can be analyzed andimproved for effectiveness or efficiency.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Other features andadvantages will be apparent from the description and drawings, and fromthe claims.

DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of an example online environment.

FIG. 2A is a block diagram of an example marketing platform.

FIG. 2B is a screen shot of an example dashboard.

FIGS. 3-6 are screen shots of example dashboards or portions ofdashboards.

FIG. 7 is a block diagram of an example gadget pool.

FIG. 8 is a schematic diagram of an example information system.

FIG. 9 is a schematic diagram of example data logs.

FIGS. 10 and 11 are flow diagrams of example processes for determiningif a match between two events should be classified as a “true” or a“false” match.

FIG. 12 is a flow diagram of an example process for determining athreshold between “true” matches and “false” matches.

FIG. 13 is a diagram of showing example thresholds.

FIGS. 14-17 are example histograms of distances (i.e., delays) betweenevents.

FIG. 18 is a table that includes example browser types, internetconnection speeds, sequences of times, and thresholds.

FIG. 19 is a diagram of an example computing device.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION System Overview

FIG. 1 is a block diagram of an example online environment 100. Theonline environment 100 can facilitate the identification and serving ofcontent items, e.g., web pages or advertisements (ads), to users.Advertisers (e.g., 102 a and 102 b, collectively referenced as 102),publishers (e.g., 106 a and 106 b, collectively referenced as 106), endusers (e.g., 300 a and 300 b, collectively referenced as 300) can accessa search engine 112 and an advertisement management system 104 through anetwork 110. The advertisement management system 104 providescustomizable dashboards that integrate various gadgets and present thedata output from the gadgets in an integrated user interface to enablethe advertisers 102 or other users of the system 104 to easily visualizeinformation collected from many sources to evaluate effectiveness ofadvertisements 120 and ad campaigns 122 in real time throughout theduration of the ad campaigns 122.

In FIG. 1, although only two advertisers (102 a and 102 b), twopublishers (106 a and 106 b) and two end users (300 a and 300 b) areshown, the online environment 100 may include many more advertisers,publishers and end users. The term “end user” refers to a consumer ofcontent and ads provided by the publishers and advertisers. Theadvertiser 102 can refer to an entity (e.g., an individual or company)whose products or services are being advertised, or an entity thatsponsors an advertisement 120. In some examples, an agent 302 canproduce ads on behalf of the advertisers 102. The term “user,” as usedin “user of the system 104,” may broadly refer to any entity that usesthe system 104, including advertisers 102 and agents 302. The network110 can be, e.g., a local area network (LAN), a wide area network (WAN),the Internet, or a combination thereof.

In some implementations, the advertiser 102 can directly or indirectly,enter, maintain, and track advertisement information in the advertisingmanagement system 104. The advertisements can be in the form ofgraphical advertisements, such as banner advertisements, text onlyadvertisements, image advertisements, audio advertisements, videoadvertisements, or advertisements combining one of more of suchcomponents, or any other type of electronic advertisement document 120.The advertisements may also include embedded information, such as links,meta-information, and/or machine executable instructions, such as HTMLor JavaScript™.

End users 300 can use end user devices (e.g., 108 a or 108 b,collectively referenced as 108) to submit page content requests 109 topublishers 106 or the search engine 112. In some implementations, pagecontent 111 can be provided to the end user device 108 in response tothe request 109. The page content can include advertisements provided bythe advertisement management system 104, or can include executableinstructions, e.g., JavaScript™ instructions, that can be executed atthe end user device 108 to request advertisements from the advertisementmanagement system 104. Examples of the end user devices 108 includepersonal computers, mobile communication devices, and television set-topboxes.

The advertisements can be provided from the publishers 106. For example,the publisher 106 can submit advertisement requests for advertisementsto the system 104. The system 104 responds by sending the advertisementsto the requesting publisher 106 for placement on one or more of thepublisher's web properties (e.g., websites and other network-distributedcontent). The advertisements can include embedded links to landingpages, e.g., pages on the advertiser's websites, that an end user isdirected to when the end user clicks an ad presented on a publisherwebsite. The advertisement requests can also include content requestinformation. This information can include the content itself (e.g., pageor other content document), a category corresponding to the content orthe content request (e.g., arts, business, computers, arts-movies, andarts-music), part or all of the content request, content age, contenttype (e.g., text, graphics, video, audio, and mixed media), andgeo-location information.

In some implementations, a publisher 106 can combine the requestedcontent with one or more of the advertisements provided by the system104. The combined page content request 109 and advertisements can besent to the end user device 108 that requested the content as pagecontent 111 for presentation in a viewer application (e.g., a webbrowser or other content display system). The publisher 106 can transmitinformation about the advertisements back to the advertisementmanagement system 104, including information describing how, when,and/or where the advertisements are to be rendered (e.g., in HTML orJavaScript™).

The publishers 106 can use general content servers that receive requestsfor content (e.g., articles, discussion threads, music, video, graphics,search results, web page listings, and information feeds), and retrievethe requested content in response to the request. For example, contentservers related to news content providers, retailers, independent blogs,social network sites, or any other entity that provides content over thenetwork 110 can be used by the publisher 106.

In this document, the term publisher, advertiser, and agent, dependingon context, can either refer to the human publisher, advertiser, andagent, or computers operated by the publisher, advertiser, and agent,respectively.

The advertisements can be provided through the search engine 112. Thesearch engine 112 can receive queries for information, and in response,the search engine 112 can retrieve relevant search results from an indexof documents (e.g., web pages). Search results can include, for example,lists of web page titles, snippets of text extracted from the web pages,and hypertext links to the web pages, and may be grouped into apredetermined number of search results.

The search engine 112 can submit a request for advertisements to thesystem 104. The request may include a number of advertisements desired.This number may depend on the search results, the amount of screen orpage space occupied by the search results, and the size and shape ofspace reserved for the advertisements. The request for advertisementsmay also include the query (as entered, parsed, or expanded),information based on the query (such as geo-location information,whether the query came from an affiliate and an identifier of such anaffiliate), and/or information associated with, or based on, the searchresults. Such information may include, for example, identifiers relatedto the search results (e.g., document identifiers), scores related tothe search results (e.g., information retrieval (IR) scores), snippetsof text extracted from identified documents (e.g., web pages), full textof identified documents, and feature vectors of identified documents. Insome implementations, IR scores can be computed from, for example, dotproducts of feature vectors corresponding to a query and a document,page rank scores, and/or combinations of IR scores and page rank scores.

The search engine 112 can combine the search results with one or more ofthe advertisements provided by the system 104. The combined informationcan be forwarded to the end user device 108 that requested the contentas the page content 111. The search results can be maintained asdistinct from the advertisements, so as not to confuse the end user 300between paid advertisements and search results.

The advertisers 102, end user devices 108, and/or the search engine 112can also provide usage information to the advertisement managementsystem 104. This usage information can include measured or observed enduser behavior related to advertisements that have been served, such as,for example, whether or not a conversion or a selection related to anadvertisement has occurred. The system 104 performs financialtransactions, such as crediting the publishers 106 and charging theadvertisers 102 based on the usage information. Such usage informationcan also be processed to measure performance metrics, such as aclick-through rate (CTR), and conversion rate.

A click-through can occur, for example, when an end user selects orclicks on a link to a content item returned by the publisher or theadvertising management system. The CTR is a performance metric that isobtained by dividing the number of end users that clicked on the contentitem, e.g., a link to a landing page, an advertisement, or a searchresult, by the number of times the content item was delivered. Forexample, if a link to a content item is delivered 100 times, and threepersons clicked on the content item, then the CTR for that content itemis 3%. Other usage information and/or performance metrics can also beused.

A “conversion” occurs when an end user consummates a transaction relatedto a previously served advertisement. What constitutes a conversion mayvary from case to case and can be determined in a variety of ways. Forexample, a conversion may occur when an end user clicks on anadvertisement, is referred to the advertiser's web page, and consummatesa purchase there before leaving the web page. A conversion can also bedefined by an advertiser to be any measurable or observable end useraction such as, for example, downloading a white paper, navigating to atleast a given depth of a website, viewing at least a certain number ofweb pages, spending at least a predetermined amount of time on a websiteor web page, or registering on a website. Other actions that constitutea conversion can also be used.

The advertisements, associated usage data, and other related parameterscan be stored as advertisement data in an advertisement data store 114.The advertiser 102 can further manage the serving of advertisements byspecifying an advertising campaign. The advertising campaign can beassociated with campaign data stored in a campaign data store 116, whichcan, for example, specify advertising budgets for advertisements, when,where and under what conditions particular advertisements may be servedfor presentation. For example, a computer company may design anadvertising campaign for a new laptop computer that is scheduled to bereleased on November 20. The advertising campaign may have a budget of$500,000, and may have 30 different advertisements that are to be servedfor presentation during the month of November. Such data defining theadvertisement campaign can be stored in the campaign data 116.

The advertisement management system 104 includes a marketing platform130 that enables the advertiser or other users of the system 104 tomeasure and review effectiveness of an advertisement 120 and a campaign122 for the advertisement 120. The marketing platform can be networkbased and be shared by authorized users to obtain information about theeffectiveness of the ad and ad campaign before, during, or after thecompletion of the campaign. For example, the authorized users can shareand exchange information about effectiveness of the ad 120 and campaign122 dynamically during the campaign 122 to enable the users to adjustthe campaign 122 or ad 120 to improve the effectiveness based on themeasured results.

The marketing platform 130 can include a multi-layer structure in whichone or more lower layers collect, exchange, and/or analyze dataassociated with the ad and ad campaign, and one or more higher layerspresent visualized results to the users of the platform 130. The lowerlayers may include for example, evaluation tool 132 that obtains onlineand offline data that are associated with the effectiveness of the ad120 and the campaign 122 and can be stored in evaluation data 134. Thehigher layers may include for example, a dashboard 136 that presents theresults of the effectiveness measurement in real time to the usersthrough a user interface. For example, the results of the effectivenessmeasurement can be presented through the dashboard 136 while the adcampaign is on-going. The effectiveness measurement data can bepresented shortly after raw data used to determine the ad effectivenessis collected and processed. It is not necessary to wait until the end ofthe ad campaign to obtain the results. In some implementations, thedashboard 130 can be a webpage accessible through a network, e.g., theInternet, by the authorized users using a network address, e.g., an IPaddress of the webpage.

Marketing Platform and Dashboards

Referring to FIG. 2A, in some implementations, the marketing platform130 includes a four-layer structure. The top layer includes thedashboard 136 that interfaces with the users (e.g., advertisers 102 oragents 302) through external application programming interfaces (API)131. The dashboard 136 presents its content 133, which may includeinformation collected by gadgets 135, to the users in a visualizedformat. The top layer is supported by an infrastructure 137 thatprocesses data from the other layers and presents the data through,e.g., a graphical user interface.

The three supporting layers include a first layer 138 that obtains rawdata associated with the effectiveness of the ad or ad campaign beingevaluated. For example, the first layer 138 may include logs 144 thatcontain raw data (e.g., unedited data) from search logs 146 havinginformation about searches, analytics logs 148 having data gathered byanalytical tools, ads logs 150 having information about which ads wereserved, toolbar logs 152 having information about statistics gathered bytoolbars, and other sources.

The first layer 138 also includes third party databases 154 that containdata indirectly related to the ad or ad campaign. For example, the rawdata in the third party databases 154 include shareable third party data162, sensitive third party data 164, private third party data 166, andother proprietary data such as retail data, marketing spend acrossvendors and media, proprietary cubes, and proprietary logs.

For example, the shareable third party data 162 can include publicinformation related to a third party, or data that the third party iswilling to share with others, either freely or through licenseagreements. Some companies that use the advertisement management system104 may not mind sharing certain information, such the ad creatives thatwere used and the amount of impressions received.

The sensitive third party data 164 can include confidential informationabout a third party that the third party allows the advertisementmanagement system 104 to access, but the data is confidential and shouldnot be shared with others. For example, the ad management system 104 mayprovide an infrastructure that allows the third party to convenientstore and manage the sensitive data. The system 104 may provide toolsthat process the sensitive data to generate various reports useful tothe third party. The gadgets 135

The private third party data 166 can include confidential information ofthe user of the advertisement management system 104 (here, the “thirdparty” refers to the user of the system 104).

The first layer 138 can include a data security mechanism to allowauthorized access to confidential data and prevent unauthorized accessto the confidential data. Some of the raw data may include personalinformation. For example, the first layer 138 can also have privacypreserved through obfuscation of individually identifying information orother personal information, through the introduction of noise into theraw data, or through other privacy protecting mechanisms.

In addition to online data, the first layer 138 of the marketingplatform 130 can also include offline data. The offline databases caninclude data associated with, for example, TV campaigns or radiocampaigns launched, for example, with the operator of the advertisementmanagement system 104.

A second layer 140 analyzes the raw data from the first layer 138 andprovides insights, e.g., trends, to the data. For example, the secondlayer 140 may include products that interact with the first layer 138and create specialized databases and front ends based on the raw data.Examples of the products include a search trends tool 156 that offersinsights into the search data 146, an analytics tool 158 that providesinformation to the analytics logs 148, and an ad metrics tool 160 thatcombines the toolbar logs 152 with ads logs 150 to measure effectivenessof online advertising. The products may also include third partyapplications 168 to analyze the raw, third party data.

A third layer 142 generates reports 172 on the effectiveness of the ad120 and campaign 122 based on the analyzed data and insights provided bythe second layer 140. APIs 174 interface the second and third layers toallow widgets 176 contained in the third layer 142 to access theanalyzed data and insights from the second layer 140. The third layer142 combines and organizes analyzed data from different sources toproduce the reports 172 to be presented to the users. The reports 172allows the effectiveness of the ad and ad campaign to be easilyvisualized and evaluated. For example, the reports 172 can be charts,text, diagrams, graph curves, or other formats.

In some implementations, the top layer containing the dashboard 136presents one or more reports 172 generated in the third layer 142 in oneor more web pages shown to the user. In FIG. 2B, a dashboard 180 in theform of an online webpage contains one or more gadgets 182, 184, . . . ,200, each providing visualization of the reports 172 of FIG. 2A.

For example, gadgets can be HTML or JavaScript applications that can beembedded in web pages or other applications. For example, the gadgetscan process data into a visualized format. For example, the gadgets canenable data sharing through the web pages or other applications.

In the example shown in FIG. 2B, the gadget 182 shows a report oncorrelation between marketing spend and the effectiveness of the ad andad campaign, which is generated based on, e.g., a combination of theeffectiveness data from databases 144 and the private third party data166. The gadgets 184, 186, 192, 194 present a calendar, custom skins202, a reading list, and news, respectively, each presenting informationbased on the shareable third party data 162. The gadget 188 and 196contain reports on performance against goal and trend, respectively. Thereports are generated based on the data in the databases 144. The gadget198 contains information about competing companies 204, which can begenerated based on the sensitive third party data 164. In some examples,a gadget can be configured to enable a user to selectively view aportion of the data output from the gadget. The user can select thecriteria for which relevant data is displayed. Such criteria caninclude, for example, a time period or a particular third party. Forexample, the gadget 188 or 196 can show performance against goal andtrend within a user selected period of time, instead of all data outputfrom the gadget.

The gadgets (e.g., 182, 184, etc.) each updates the informationpresented in real time manually or automatically. For example, the firstlayer 138 and second layer 140 of FIG. 2A can be configured toautomatically update the databases 144 and 154 (e.g., at a pre-selectedtime interval), and analyze the updated raw data in the databases. Thegadgets of the third layer 142 and the top layer exchange data with thesecond layer 140 through the API 174, e.g., by importing data from orexporting data to the second layer 140 to generate the reports 172 andvisually present the reports to the user in real time using the updateddata. In some implementations, the user can manually update theinformation provided by the gadgets by, for example, refreshing thedashboard 180 or reopening the dashboard 180.

In some implementations, the gadgets (e.g., 182, 184, 186, etc.) maycommunicate with each other through the API 174 so that when theinformation in one gadget changes, other gadgets that contain or usethis information are updated. For example, if the database 144 obtainsnew raw data regarding the performance of the ad 120, the gadget 196obtains the analyzed new raw data through the API 174 and presents anupdated trend. At the same time, the gadget 188 updates the visual graphfor performance against goal. As another example, a gadget containingcampaign details can be included in the dashboard 180 such that whenparameters of the campaign are updated, the updated parameters can beautomatically populated to the other gadgets (e.g., gadget 188) that areusing the parameters. Other types of communications are also possible.

In addition to presenting visualized real-time effectiveness data to theuser, the dashboard 136 also provides a platform for exchanging data(e.g., raw data from databases 144 and 154 of FIG. 2A) and results ofanalyses (e.g., analyzed data from the insight layer 140) between theinformation requesters (e.g., the users of the dashboard 136) and theinformation providers (those who provide the online or offline raw oranalyzed data). Such an information exchange platform can facilitatemonetization of the exchange of information. For example, theinformation provider can charge the use of the raw data or analyzed dataeach time the gadgets of the dashboard 136 import the data.

The information exchange platform can also facilitate communicationbetween the information requesters and providers, and help therequesters and the providers find each other and collaborate onprojects. For example, when creating the dashboard 136 and selecting thegadgets, the information requesters become exposed to multipleinformation providers and can choose the ones that provide informationof interest to them. The information providers can also marketthemselves more effectively through the platform.

The dashboard 136 can be created by, e.g., the advertisers 102 or theagents 302 in FIG. 1. In some implementations, when an advertiser 102requests an agent 302 to prepare an advertisement to be published by thepublishers 106, in addition to the requested ad, the agent 302 alsocreates a dashboard 136 online for the ad 120 and the associated adcampaign 122. The agent 302 can deliver a link to the dashboard 136 tothe advertiser 102 so that the advertiser 102 can access the dataregarding the effectiveness of the ad 210 and the campaign 122 prior tothe end of the campaign.

In some implementations, the advertisement management system 104executes code that implements the dashboards, and advertisers 102 accessthe dashboards using, e.g., web browsers through the links provided bythe agents 302. The outputs of the dashboards can be provided asinteractive web pages shown on the computers of the advertisers 102. Theadvertiser 102 and the agent 302 can exchange or share live, real-timeinformation about the effectiveness through the dashboard 136. It is notnecessary to wait for the completion of the campaign in order to gatherand analyze relevant data. The real-time information can also enable theagent 302 and the advertiser 102 to adjust the strategy of the campaignand modify the ad or ad campaign to improve the effectiveness of the adprior to the end of the campaign.

In some implementations, the gadgets may have client-side code thatallows the gadget to access private data (e.g., ad revenue data)residing on the computers of the advertisers 102. The advertiser 102 maynot wish to share such data with the operator of the advertisementmanagement system 104, and thus does not upload such data to theadvertisement management system 104. The gadget may have securitymeasures to prevent unauthorized access to the private data. Forexample, the gadget may request a user name and a password from theadvertiser 102.

In some implementations, various gadgets may communicate with oneanother and exchange private data of the advertiser 102. Each gadget mayrequest the advertiser 102 to provide a user name and password, and onlygadgets with proper credentials may receive the private data.Alternatively, the dashboard may request the advertiser 102 to enter auser name and password, and gadgets that are designated secure gadgetsby the advertiser 102 may receive private data. The dashboard mayprovide a user interface to allow the advertiser 102 to modify the listof gadgets that can receive private data. Different gadgets may havedifferent security levels and have different access levels with respectto different types of private data. In some implementations, the gadgetsmay combine data provided by the advertisement management system 104with private data of an advertiser 102 to generate useful information.

Users accessing the dashboard 136 online can be authenticated to preventunauthorized access of the dashboard 136. For example, theauthentication process can include requesting the users to enter a username and a user password. In some implementations, the creator of thedashboard 136, e.g., the advertiser 102, sets up the securityinformation for the dashboard 136. For example, the agent 302 mayauthorize selected users (e.g., advertisers 102 who are clients of theagent 302) to access the dashboard 136, set up authenticationinformation for the selected users, and deliver the authenticationinformation to the selected users to enable the selected users to accessthe dashboard 136.

Depending on the application and information required by the advertiser102, the dashboard 136 can include various combinations of gadgets toprovide the required information, e.g., metrics that are useful to theadvertiser 102. When the user (e.g., advertiser 102 or agent 302)creates a customized dashboard that includes gadgets, the user cancreate his own gadgets, use gadgets provided by the operator of theadvertisement management system 104, or gadgets acquired from a gadgetpool.

The gadget pool refers to a group of gadgets that accessible to users.Some of the gadgets in the gadget pool may be free, some may beavailable for purchase. A web portal may be provided to allow users toaccess the gadget pool, view descriptions of the gadgets, and optionallydownload demo versions of the gadgets.

The dashboard 136 described above can be created online using predefinedskins, or created offline using a template and uploaded online for lateruse. Skins allow a user or developer to control the appearance of thedashboard, by supplying a set of formatting instructions and graphicalelements that can be used to supplement or replace default elements usedto format the dashboard when a skin is not applied. Skins may affect,e.g., font styles and sizes, colors, borders, backgrounds, images, andother design elements of the dashboard. A creator of the dashboard 136can access the skins or the template through a user interface, e.g., acomputer screen, and add and arrange gadgets on the skins or thetemplate. Skins can also be used to control the appearances of gadgets,for example, to cause the gadgets within a dashboard to have a similartheme or style.

FIG. 3 shows an example user interface for generating a dashboard 208.For example, the dashboard 208 can be generated online using predefinedskins provided by the operator of the system 104. The skins each supplya set of formatting instructions and graphical elements that define thelook and feel of the gadgets. The creator of the dashboard 208 candesign the graphical appearance of the gadgets. The dashboard 208 canalso be created using other online templates.

In some implementations, the dashboard 208 can include one or more pageseach having one or more gadgets. One page is shown at a time, and thepages not shown can be accessed through tabs (e.g., 212). Additionalpages can be added by clicking on the “add a tab” button 210. Gadgetscan be added to any of the pages. In the example shown in FIG. 3,gadgets 214, 216, . . . , 226, etc., are added to the page thatcorresponds to the “Campaign Effectiveness” tab 212. The user canarrange the positions of the gadgets in the dashboard. Each of thegadgets can provide information about effectiveness of the ad and the adcampaign using graphics or text. For example, the gadget 214 presentscampaign information, such as campaign IDs 228 and campaign sites 230.Various effectiveness information is presented by the gadgets 216-226including, for example, effectiveness by site measured by audience score232. Some gadgets can also provide additional information to help designand control the campaign. The additional information can include, forexample, education levels of the audience of the campaign, as shown ingadget 226, or the number of unique users 234 for particular sites, asshown by the gadget 222. Some of the additional information can becollected by polls or surveys provided online in association with thead.

Other gadgets can also be used in the dashboard 208, a gadget that showsa calendar, a gadget that shows search trends, gadgets that show newsreports, as well as the gadgets shown in FIG. 2B. The gadgets in thedashboard 208 can be arranged in one or more pages. For example, whenthe gadgets cannot fit within one page, the user can add an additionalpage using the button 210.

Offline information can be valuable to the measurement of effectivenessof the ad and ad campaign. Referring to FIG. 4, for example, a “TVEffectiveness” gadget 274 pulls real time, second-by-secondeffectiveness data from set-top boxes by channel or by commercial. A“Radio Effectiveness” gadget 276 displays relevant search data alongsidecampaign activity as a measure of effectiveness. For example, relevantdata can be obtained by surveying listeners of radio programs. Inaddition, the “Campaigns” gadget 278 displays information obtainedacross TV and Radio, for example, the expense of each campaign alongwith a measure of effectiveness derived from the TV effectiveness gadget274 and the radio effectiveness gadget 276.

Referring to FIG. 5, as an example, a page 240 of the dashboard under a“brand health” tab 238 can include gadgets presenting both online andoff-line information. For example, the gadgets 244 and 246 presentoff-line information from media, e.g., TV, radio, or printedpublication. In the page 240, a gadget 242 provides information aboutthe brands of interest, such as brand names 248, brand sites, producttypes, SKUs, etc. A gadget 244 presents information about theeffectiveness of recommendation from various media, such as Internet250, TV 252, radio 254, and magazine 256. The information can becollected by polls, organized and provided by other providers, andaccessible using, for example, the online searching conducted by thefirst layer 138 (FIG. 2A). Another gadget 246 provides a chart thatshows effects on user purchase behavior during the ad campaign.Information about the purchase can be obtained from, e.g., retailers.

A “web brand alert” gadget 243 and a “search brand trend” gadget 245 canprovide online effectiveness measurement information. For example, the“web brand alert” gadget 243 presents the amount of negative commentsversus positive comments on the brand (or ads or products associatedwith the brand), obtained from various web pages. The “search brandtrends” gadget 245 automatically shows the search volumes for oneparticular product of the brand, e.g., the positive versus negativecomments on the product. Additional pages can be added to the dashboard208 to provide the user with additional information regarding theeffectiveness of the ad and the campaign. For example, referring to FIG.6, a “competitive tracking” page 260 can be added that allows the userto track its own brand as well as other competing brands. The“competitive tracking” page 260 can include a “brand detail” gadget 261in which a user can enter its own brand and its competitors' brands tobe tracked. The details of the brands from the gadget 261 are populatedto other gadgets on the page 260 so that the other gadgets automaticallycollects information related to the brands. The page 260 can includebrand health information to allow comparison with the competitors'brands. For example, a “web brand alert gadget” 262 automaticallysubscribes to all posts posted online that are related to theadvertiser's brand. The second layer 140 of the marketing platform ofFIG. 2A analyzes data from the posts to enable the gadget 262 to presentalert information, for example, percentage of negative comments versuspositive comments from all subscribed posts. The page 260 can include a“search brand trends” gadget 264 that automatically shows percentage ofnegative comments versus positive comments from the posts, about oneparticular model of the advertiser's brand.

Information bout the competitors' brands can be obtained from a“competitive searches” gadget 266 and a “competitive sites gadget” 268on the page 260. For example, the competitive-searches gadget 266 showsthe difference in the number of queries for competitive search termsbetween test and control groups as a result of exposure to the ads ofthe advertiser's brand; and the competitive sites gadget 268 shows thedifference in the number of website visits between the test and controlgroups as a result of exposure to the ads of the advertiser's brand.

The page 260 can include other gadgets not shown in the figure, forexample, a brand recall gadget or a brand affinity gadget that showssurvey results for people who visit competitive sites versus those whovisit sites of the advertiser's brand. The information obtained from thevarious gadgets discussed above can also be used in marketing research.

The effectiveness measurement of an ad and ad campaign using thedashboard described above substantially relies on the functions of thegadgets included in the dashboard. To serve particular goals of thedashboard and the effectiveness measurement, a user may have to createits own gadget.

Gadget Marketplace

Referring to FIG. 7, in some implementations, a gadget marketplace canbe implemented by providing a storage 310 to allow users (e.g., 312,314, 316) to upload gadgets (e.g., 284, 286, and 288) through a network280 to form an online gadget pool 282. The storage 310 can be, e.g., oneor more hard disk drives in one or more server computers. The gadgets inthe gadget pool 282 can be made available, for free or with a fee,through the network 208 to the public or authorized users (which caninclude, e.g., 312, 314, and 316).

The creators of some of the gadgets in the gadget marketplace can chargefees for the use of their gadgets. The gadget pool 282 can include anauthentication mechanism 290 and a payment mechanism 292 to enablesecure trading of the gadgets. The authentication mechanism 290 can, forexample, allow a new user to open an account with a user name, apassword, and optionally, other identity certification information,e.g., credit card information. A user can access the pool 282 using hisor her account information. The payment mechanism 292 can allow onlinepayment using, for example, a credit card, a debit card, a bank card, agift card, or other payment methods. The fee can be charged based on,for example, the use of the gadget(s), e.g., a certain fee per adcampaign.

Some gadgets, for example, gadgets 284, 286, in the gadget pool 282 caninteract with one another, and such gadgets can be grouped in one ormore subgroups, for example, subgroups 294, 296. In the example ofsubgroup 294, at least a portion of information contained in orotherwise available to the gadget 284 is shared with the other gadget286 so that when the information is updated in the gadget 284, theupdated information is populated to the gadget 286, e.g., by use of anAPI 298 a. Similarly, when information contained in the gadget 286 isshared by the gadget 284, updating the information contained in thegadget 286 will also cause the updated information to be transmitted tothe gadget 284, e.g., by use of an API 298 b. Other examples of thegadgets that interact with one another are also provided in FIGS. 3 and4.

In some implementations, when a user accesses the gadget pool topurchase or download a gadget in a subgroup, the gadget poolautomatically presents one or more related gadgets to the user. In someimplementations, the fee for use of two or more related gadgets in asubgroup can be calculated differently from the use of individualgadgets in order to encourage the use of the related gadgets.

In addition to the gadgets that are used for measurement of theeffectiveness of an ad or ad campaign, the gadget pool can also includea variety of gadgets for other uses. For example, in connection with thead or ad campaign, the gadget pool can include a first gadget that isbuild based on an econometric model and other types of modelingalgorithms that use basic company and market data to providerecommendations on the mixture use of various types of media for themarketing of a product. For example, the gadget can, prior to or duringthe campaign, provide a recommendation on what percentage of theadvertising budget should be spent in various media, e.g., online, TV,radio, and/or print media.

A creator of the media gadget can take basic company data along withinputs such as product lifecycle and product category, and produce across-media mix allocation recommendation. The creator can also create asecond, more sophisticated gadget that takes current campaign data fromvarious resources and use the data to enhance the results of thecross-media mix recommendations both within one particular medium oracross media. The accuracy of the predictions of the modeling gadgetscan be measured using the campaign effectiveness dashboard containingeffectiveness measuring gadgets, as described above.

Various marketing methods can be applied in selling the gadgets in thegadget pool 282. In the example of the media mix gadgets discussedabove, to promote marketing of the second gadget, the creator of thefirst and second gadgets can offer the first gadget for free to attractcustomers. Customers can freely try the functions of the first gadgetand determine whether they wish to purchase the second gadget, which hasenhanced features.

Examples of Gadgets

As described above, the dashboard 136 may show various gadgets 135 thatanalyze and present information useful for evaluating performances ofadvertisements and ad campaigns. The gadgets 135 may be written by theusers of the dashboard 136 or obtained from the gadget marketplace. Thegadgets 135 can include client-side code that resides on a client (e.g.,computer of an advertiser 102) and can access private data of the clientand sharable data hosted at the advertisement management system 104. Thegadgets 135 can also include server-side code that resides on a server(e.g., advertisement management system 104) that utilizes the dataprocessing power of the system 104 and processes a vast amount of datahosted on the system 104.

The following describes an example gadget that collects informationrelated to the timing of both a content presentation on a web page and aweb page access by a user, and uses the collected information todetermine if the user accessed the web page while the content waspresented on the web page. By knowing whether certain users who accessedcertain web pages have viewed a particular ad or otherwise acted uponthe ad, it may be possible to analyze user behavior and determine theeffectiveness of the ad, e.g., by comparing on-line behaviors of userswho have viewed the ad with on-line behaviors of users who have notviewed the ad.

In some implementations, the gadget may merge records from browsing logsprovided by page link analysis tools and content logs provided by acontent server. The browsing logs may provide information about thebrowsing histories of end users, such as when and what web pages wereaccessed by the end users. The content server can be, e.g., an adserver, and the content log can provide information about when and whereads were served. The timing of events recorded in the browsing logs andthe content log may not match exactly. The gadget may merge the two logsby determining time intervals, each time interval being between a timepoint related to when a web page is accessed on a device and anothertime point related to when a content item is displayed on the accessedweb page, and comparing the time intervals with one or more thresholdvalues.

Referring to FIG. 8, an example information system 1100 is shown forobtaining and analyzing information related to web pages accessed byusers and content displayed on the accessed web pages.

Information is exchanged through a network 1110 between a content server1101, computers (e.g., computer 1104 a, laptop 1104 b, cell phone 1104c, computer 1104 d) that are each associated with a user (e.g., user1106 a, 1106 b, 1106 c, 1106 d), a page link analysis server 1102, website publishers that host web sites on web servers (not shown), and acollector-analyzer 1108. The collector-analyzer 1108 can include twocomponents: a collector that collects data and an analyzer that analyzesdata.

In some examples, a user (e.g., user 1106 a, user 1106 b) accesses webpages using a web browser 1122, such as Firefox®, Microsoft® InternetExplorer (MSIE), Safari®, or Chrome, that is installed on the computer(e.g., computer 1104 a, computer 1104 b). The computer can use anapplication program, such as a page link analysis tool 1125, to evaluatethe accessed web pages while preserving privacy of a user. Informationabout the internet browsing session can be gathered, for example, thetime and date the user 106 d accesses a web page, the web page accessed(e.g., the URL), and a unique identification (ID) number. In general,the unique ID is not associated with personally-identifiable informationof user 1106 d. The browsing information can be recorded and stored in alog 1124, which can be stored in the memory of the computer 1104 d.

For example, while browsing web pages, the user 1106 d can use a webtoolbar that has a page link analysis feature enabled. As the user 1106d visits various web pages, the page link analysis tool 1125 storesinformation in a browsing log 1124 that can include information such asuniversal resource locators (URLs) of the web pages, time stampsindicating when the user 1106 d visited the web pages, an InternetProtocol (IP) address associated with the user 1106 d, and a uniqueidentification (ID) number that can be part of a cookie. As describedabove, the unique ID is generally not associated withpersonally-identifiable information of user 1106 d. The information, inpart or as a whole, can be sent to a page link analysis server 1102, andcombined with browsing logs 1124 from other users into aggregatebrowsing logs 1116. The information sent from individual computers canbe filtered or anonymized to preserve the privacy of individual users.

The information stored in the browsing log 1124 associated with a user(e.g., user 1106 c, user 1106 d) can be sent to the page link analysisserver 1102, to the collector-analyzer 1108, or both. The page linkanalysis server 1102 determines the page link analysis of the web pagesassociated with the URLs and sends the page link analysis results to thepage link analysis tool 1125 associated with the user (e.g., user 1106c). The page link analysis tool 1125 then displays the page linkanalysis results. The browsing log 1124 can be combined with otherbrowsing logs 1124 to form aggregate browsing logs 1116, which can bestored on the page link analysis server 1102, on the collector-analyzer1108 (e.g., in a memory 1128), or both. The aggregate browsing logs 1116preferably cannot be traced to the personal identities of individualusers. This ensures privacy of the users 1106 a-d.

The content server 1101 stores content 1112 (e.g., an advertisement)from content providers 1114. The content server 1101 can provide thestored content 1112 to a web site publisher. A web page that includesthe content 1112 can be delivered from a web site publisher to users(e.g., users 1106 a-d) through the network 1110. When the content 1112is shown on a web page, the time and the location (e.g., the URL) of theshowing can be recorded in addition to which IP addresses accessed theweb page while the content was shown. The recorded information can bestored, for example, in a content log 1118 on the content server 1101,or in a log on the web site server, or in both logs.

Neither the content log 1118 nor the browsing log 1124 by itselfcontains all the information needed to identify which users were exposedto the content 1112. This identification is important and can be used,for example, to help determine how on-line behaviors of users areaffected by content (e.g., ads) that is presented. The behavioraldeterminations (e.g., how likely users who were presented with a contentare to visit other web pages or to participate in financial or searchingtransactions) then can be used to adjust the content 1112 that ispresented to other users who have similar demographics or determinedbehaviors. In some examples, the impact of an ad campaign can beassessed so that advertising money may be spent more effectively and sothat users may receive more relevant ads.

The collector-analyzer 1108 of the information system 1100 merges theinformation in the content log 1118 and the aggregate browsing logs 1116and creates merged data logs 1126. The merged data logs 1126, which canbe created by performing operations on a processor 1130 and stored in amemory 1128 of the collector-analyzer 1108, then can be used todetermine which users have accessed a web site while the content 1112(e.g., an ad) was displayed on the web site.

Merging Data Logs

Referring to FIG. 9, as an example, the aggregate browsing logs 1116include browse records 1140. Each browse record 1140 can include, e.g.,a browsing timestamp 1150, a browsing IP address 1152, a user ID 1154, aURL 1156, as well as other information 1155 (e.g., a language used inthe browser, a country in which the user is located, a version ofsoftware being used by the user, a screen size of a computer 1104 usedfor accessing websites). Multiple browsing timestamps 1150 and browsingIP addresses 1152 can be listed in the browse record 1140. The user ID1154 can be, e.g., an identifier of the page link analysis tool 1125.The content log 1118 can include content records 1142. Each contentrecord 1142 can include, e.g., a content timestamp 1158, a content IPaddress 1160, a content identifier 1162, as well as other information1163.

The browse record 1140 and the content record 1142 can be merged basedon matching the browsing IP address 1152 and the content IP address 1160and the browsing timestamp 1150 and the content timestamp 1158 (within apredetermined window). A major complication is that the browsingtimestamp 1150 may correspond to a different type of event than the typeof event that is associated with the content timestamp 1158. It ispossible that the clocks are misaligned, it is also possible that bothclocks are accurate but they record different events in serving ads.

The merged records 1144 correspond to the merged data logs 1126 and canbe stored at the collector-analyzer 1108. Each merged record 1144includes an interval 1164, an IP address 1166, an ID 1168, a URL 1170, acontent ID 1172, and can include other information 1169. The interval1164 is equal to a difference between the browsing timestamp 1150 andthe content timestamp 1158. The IP address 1166 is the same as thebrowse IP address 1152, which is the same as the content IP address1162. The ID 1168 is the same as the user ID 1154, the URL 1170 is thesame as the browse URL 1156, and the content ID 1172 is the same as thecontent ID 1162.

In order to determine when a user was presented with the content 1112, asituation that will be referred to as a “true” match, the timings of theevents are analyzed carefully. The conditions that make a “true” matchmore likely than a “false” match (i.e., a user was not presented withthe content 1112) can be estimated by statistically analyzing the mergeddata logs 1126. A merged record 1144 indicates a “true” match when theIP addresses of the records 1140 and 1142 are the same and the interval1164 is smaller than a predetermined threshold.

By analyzing the information in the merged data logs 1126, thecollector-analyzer 1108 can provide information about the web browsinghistory of a user (e.g., the 1106 d) before and after receiving thecontent 1112. Because the aggregate browsing logs 1116 include datacollected from many users, the collector-analyzer 1108 can effectivelycompare users who have been presented with the content 1112 and userswho have not been presented with the content 1112 and examine thedifferences in on-line behaviors of the two groups of users to infer theeffectiveness of the content 1112.

Data Analysis and Match Classification

FIG. 10 shows a process 1200 that represents a sequence of operationsperformed by the collector-analyzer 1108 in analyzing the data in themerged data logs 1126. For example, the operations can be executed bythe processor 1130 of the collector-analyzer 1108. In some embodiments,the operations can be executed by multiple processors present in thecollector-analyzer 1108. Operations can include obtaining 1202 data(e.g., log 1124) that includes a first time (e.g., the browsingtimestamp 1150) related to when a device (e.g., computer 1104 a)accessed a web page; obtaining 1204 data (e.g., log 1116, log 1118) thatincludes a second time (e.g., the content timestamp 1158) related towhen content was displayed on the accessed web page. The obtained datacan be used to optionally estimate 1206 the browser type (e.g.,Firefox®, Microsoft® Internet Explorer, Safari®, Chrome) that accessesthe web page, the internet connection speed for the device, or both. Insome examples, the connection speed can be fast (e.g., broadbandtechnologies such as DSL, cable modems, VDSL, or optical fiber) or slow(e.g., a dial-up connection). In some examples, the browser type or theinternet connection speed or both can be unknown.

An interval (e.g., the interval 1164) can be determined 1208 between thefirst time (e.g., the browsing timestamp 1150) and the second time(e.g., the content timestamp 1158). It can be determined 1210 whetherthe first time occurred before the second time. In some implementations,if the first time is determined to have occurred before the second time,a decision is made 1212 whether or not the interval is less than athreshold chosen for the browser type and the internet connection speed.If the interval is less than the threshold, the match is classified 1214as a “true” match; if not, the match is classified 1216 as a “false”match. Likewise, if the first time is determined to have occurred afterthe second time, a decision is made 1218 whether or not the interval isgreater than a threshold chosen for the browser type and the internetconnection speed. If the interval is greater than the threshold, thematch is classified 1214 as a “true”; if not, the match is classified1216 as a “false.”

In some implementations, there can be two thresholds that define threeregions: true, uncertain and false. For example, if the interval is lessthan a first threshold, the event is classified as true. If the intervalis larger than a second threshold, the event is classified as false. Ifthe interval is between the two thresholds, the event is classified asunknown. The two thresholds can be used to control both the probabilityof wrongly declaring true and wrongly declaring false.

In some embodiments, the results of the process 1200 can be accomplishedby performing the described steps in a different order. In someembodiments, detection of true, false, or uncertain matches can beperformed by adding steps to the data collecting and analyzing.

EXAMPLES

The following is an example that illustrates various steps that canoccur in the system 1100 when web pages and content 1112 are deliveredto users. For example, at 10:02 pm on Dec. 8, 2008, a service providercan deliver a web page (e.g., http://www.nytimes.com) to a computer(e.g., cell phone 104 c) associated with a user 11106 c. Content 1112(e.g., an advertisement by Neiman Marcus) can also be provided on theweb page when the user accesses the web page. A record can be created ina log (e.g., browsing log 1124, aggregate browsing log 1116) that caninclude, for example, the time and date, the URL of the web page, aunique ID associated with the user 1106 c, and an IP address associatedwith the cell phone 1104 c, which can use a mobile IP, 3G, or othercommunication protocol. A record can also be created in a log (e.g.,content log 1118) that can include, for example, an identifier of thecontent 1112 shown, the time and date the content was shown, the URL ofthe web page that displayed the content, and which IP addresses may haveviewed the content. The browsing log 1124 typically does not containinformation about the content 1112 shown on the web page.

Alternatively or in addition, a service provider can deliver (e.g.,through a router 1120) a web page to one or more computers (e.g.,computer 1104 a and laptop 1104 b) associated, respectively, with users1106 a and 1106 b. A connection between the router 1120 and thecomputers can be wireless or through a hardware connection. In someexamples, a computer 1104 a and a laptop 1104 b at a home or an officecan share an IP address and use the same router 1120. Two users 1106 aand 1106 b can each access different websites at his respective computerwithin a short period of time (e.g., by one second apart) while twodifferent contents 1112 are presented on the web sites. In this example,the shared IP address will be recorded twice in the content log (e.g.,content log 1118) each being associated with respective content 1112.Without additional information, any behavioral determinations (e.g., howlikely users are to visit other web pages or to participate in financialor searching transactions) may not be correlated with the appropriatecontent 1112. Therefore, it is useful to combine the browsing log 1124and the content log 1118 to provide information on which of the users1106 a and 1106 b accessed which website and viewed which content 1112.

In another example, the computer 1104 a and the laptop 1104 b, whichshare an IP address and use the same router 1120, can be used,respectively, by users 1106 a and 1106 b. If a browsing log 1124 isassociated with the user 1106 a and a separate browsing log 1124 isassociated with the user 1106 b, differences in web behaviors can berecorded for each user. For example, if the user 1106 b accesses the website on laptop 1104 b while the content 1112 is presented, but user 1106a does not access a web site that displays the same content 1112,information can be recorded in logs (e.g., browsing log 1124, aggregatebrowsing log 1116, content log 1118) that specify these differences.

The collector-analyzer 1108 can determine which user at the shared IPaddress accessed the web site while the content 1112 was presented. Thecollector-analyzer 1108 makes determinations for anonymized users.However, IP addresses that are known to provide a large number of userswith web access (e.g., universities, corporations) can be excluded fromanalysis by the collector-analyzer 1108. In some examples, historicaldata can be used to monitor the level of activity of IP addresses, and,if there is an inconsistency with previously monitored usage rates, theIP addresses can be excluded from analysis by the collector-analyzer1108.

The collector-analyzer 1108 can obtain a first data (e.g., browsing log1124, aggregate browsing log 1116), which contains information relatedto when a user (e.g., user 1106 d) accessed a web page, and a seconddata (e.g., content log 1118), which contains information related towhen a content (e.g., content 1112) was displayed on the accessed webpage.

The first data can be merged with the second data to form a merged datalog (e.g., merged data logs 1126), which can be stored in a memory 1128of the collector-analyzer 1108 and analyzed by executing instructions ina processor 1130 of the collector-analyzer. Because the recorded eventscan be of different natures (e.g., when the content 1112 was presentedversus when the user 1106 d accessed the web page), in order todetermine when a user was presented with the content 1112, an occurrencethat is also referred to as a “true” match, the timing of the eventsshould be analyzed carefully.

The system 1100 can generate reports having information about how users'behaviors changed after presentation of the content 1112. The reportscan be provided to a content provider 1114.

Classification of Matches and Derivation of a Threshold

Referring to FIG. 11, a process 1300 describes a sequence of operationsfor classifying each of a plurality of intervals that correspond to thetime between two events as “true” (i.e., the two events are “matched”),“false” (i.e., the two events are “unmatched”), or “uncertain” (i.e.,the two events are neither “matched” nor “unmatched” with confidence).The operations, performed by the collector-analyzer 1108, are executedtypically by the processor 1132. In some embodiments, the operations canbe executed by multiple processors present in the collector-analyzer1108. For each of a plurality of devices (e.g., computers, cell phones)data can be obtained 1302 that includes a first time (e.g., the browsingtimestamp 1150) related to when the respective device accessed a webpage. Data can also be obtained 1304 that includes a second time (e.g.,the content timestamp 1158) related to when content was displayed on theaccessed web page. Based on the obtained data, an internet connectionspeed and a browser type can optionally be estimated 1306 for eachdevice.

An interval can be determined 1308 for each device between the firsttime (e.g., the browsing timestamp 1150) and the second time (e.g., thecontent timestamp 1158). Each interval can be assigned 1312 a priorprobability of corresponding to a “true” match (e.g., a user accessedthe web page while a content was displayed on the web page). If theinterval is larger than a predetermined value (e.g., about five minutes,about seven minutes, about 10 minutes, fractional values between fiveand 10 minutes), the interval can be excluded from analysis because itis unlikely to correspond to a “true” match. The prior probability is amarginal or unconditioned probability of a match and can be interpretedas a description of what is known about a variable in the absence of newdata. The prior probability differs from a posterior probability, whichis a conditional probability of the variable that considers theimplications of new data. The posterior probability is computed from theprior probability and Bayes' theorem:

$\begin{matrix}{{{P\left( {{true}D} \right)} = \frac{{{P({true})} \cdot {non\_ uniform}}{\_ dist}\left( {{D\mu},\sigma} \right)}{\begin{matrix}{{{{P({true})} \cdot {non\_ uniform}}{\_ dist}\left( {{D\mu},\sigma} \right)} +} \\{{P({false})}/{max\_ D}}\end{matrix}}},} & (1)\end{matrix}$

in which D represents an interval between the first and second times,P(true) represents the prior probability of a “true” match,non_uniform_dist represents a non-uniform distribution that has a mean μand a standard deviation σ, P(false) represents the prior probability ofa “false” match, and max_D represents the maximum distance between thefirst and second times. The mean μ and the standard deviation σ can beunknown and estimated iteratively (e.g., using anexpectation-maximization algorithm, gradient descent method,Gauss-Newton method).

A “true” match can be considered as a sample from a distribution havingcertain parameters (e.g., a mean u, a standard deviation σ). Theparameters of this distribution can be estimated iteratively for a giveninterval. Referring again to the process 1300 in FIG. 11, a probabilityof a “true” match can be estimated 1314 for a given interval D. Eachinterval can be weighted 1316 by the estimated probability that it is a“true” match. A weighted mean and a standard deviation of the log of theinterval can be calculated 1318. The probability of a “true” match canbe recalculated 1320 for a given interval.

A decision is made 1322 whether or not the calculated posteriorprobability that the interval corresponds to a “true” match is stable,or not changing by a predetermined amount during successive iterations.If the calculated posterior probability is not stable (i.e., is changingmore than a predetermined amount), steps 1314, 1316, 1318, and 1320 arerepeated. If the calculated posterior probability is stable, it isdecided 1324 if the calculated posterior probability of a “true” matchis nearly equal to 1 (e.g., about 0.9-about 0.99999). If the calculatedposterior probability of a “true” match nearly equals 1, the interval isclassified 1326 as corresponding to a “true” match. If the calculatedposterior probability of a “true” match does not nearly equal 1, it isdecided 1328 if the calculated posterior probability of a “true” matchis nearly equal to 0 (e.g., about 0.00001-about 0.001) the interval isclassified 1330 as corresponding to a “false” match. If the calculatedposterior probability of a “true” match does not nearly equal 0, theinterval is classified 1332 as corresponding to an “uncertain” match.

In some embodiments, the results of process 300 can be accomplished byperforming the described steps in a different order. In someembodiments, detection of true, false, or uncertain matches can beperformed by adding steps to the data collecting and analyzing.

Referring to FIG. 12, a process 1400 describes a sequence of operationsfor determining a threshold to help classify each of a plurality ofintervals that correspond to the time between two events as “true,”“false,” or uncertain. The operations, performed by thecollector-analyzer 1108, are executed typically by the processor 1132.In some embodiments, the operations can be executed by multipleprocessors present in the collector-analyzer 1108. A plurality ofintervals can be obtained 1402, in which each interval is associatedwith a device (e.g., a computer, a cell phone) and corresponds to a“true” match, a “false” match, or an “uncertain” match. The obtaineddata can be used to determine 1404 the browser type (e.g., Firefox®,Microsoft® Internet Explorer, Safari®, Chrome, or unknown) used toaccess the web page. The obtained data can also be used to determine1406 the internet connection speed (e.g., fast, slow, or unknown). Theobtained data can also be used to determine 1408 whether the first time(e.g., the browsing timestamp 1150) occurred before or after the secondtime (e.g., the content timestamp 1158). In some examples, thedeterminations 1406 and 1408 can be made by analyzing information in logfiles (e.g., browsing log 1124, aggregate browsing log 1116, content log1118). In some examples, the determinations 1406 and 1408 can be madepreviously (e.g., estimate 1306 and determination 1310, respectively)and need not be determined again.

For each grouping of browser type, internet connection speed, andsequential order of first and second times, one or more thresholds canbe determined 1410 that divide the plurality of intervals into classescorresponding to “true,” “false,” or “uncertain” matches. In someexamples, a lower threshold can be determined, such that if a sampleinterval has a value between zero and the lower threshold, the sampleinterval would be classified as corresponding to a “true” match. In someexamples, an upper threshold can be determined in addition to the lowerthreshold, such that if a sample interval has a value between the lowerthreshold and the upper threshold, the sample interval would beclassified as corresponding to an “uncertain” match. If the sampleinterval has a value above the upper threshold, the sample intervalwould be classified as corresponding to a “false” match. The lower andupper thresholds can be determined such that misclassification of asample interval is controlled. In some examples, the upper and lowerthresholds can have the same value, while, in other examples, the upperand lower thresholds can have different values.

In some embodiments, the results of process 1400 can be accomplished byperforming the described steps in a different order. In someembodiments, detection of true, false, or uncertain matches can beperformed by adding steps to the data collecting and analyzing. Thethresholds determined in the process 1400 can be derived from historicaldata. New data (e.g., intervals 1164 from merged data logs 1126) can becompared to the determined thresholds to classify the new data ascorresponding to a “true” match, a “false” match, or an “uncertain”match.

The distribution parameters (e.g., a mean μ, a standard deviation σ) aswell as the determined thresholds can be updated periodically (e.g.,hourly, daily, weekly, monthly, bimonthly, every six months, everyyear). In some examples, the parameters and thresholds can be determinedagain using a combination of the historical data used previously andadditional historical data received since the last determination ofthresholds. In some examples, the process 1300 and the process 1400 canbe rerun in part or in total. In some examples, classifications of newdata can be recorded and an on-line algorithm can be used to adjust thedistribution parameters and the determined thresholds at predeterminedintervals (e.g., seconds, hours, days, weeks, months).

Referring to FIG. 13, an delta parameter (or time interval) can becalculated by subtracting the ad time from the web browsing time. Thedelta can be any value (positive, negative or zero. For each browsertype, there are four thresholds (e.g., from most positive to mostnegative: threshold A 1422, threshold B 1424, threshold C 1426, andthreshold D 1428). When delta is compared to these four thresholds,there are five possible cases:

(1) If delta is greater than or equal to A, then we know it isdefinitely not a match (delta falls within the “false matches” region1430 in FIG. 13);

(2) If delta is less than A but greater than or equal to B, then it isan uncertain match (delta falls within the “uncertain” region 1432);

(3) If delta is less than B but greater than or equal to C, then it is acertain match (delta falls within the “true matches” region 1434);

(4) If delta is less than C but greater than or equal to D then it is anuncertain match (delta falls within the “uncertain” region 1436); and

(5) If delta is less than D then it is definitely not a match (deltafalls within the “false matches” region 1438).

In some examples, the potential “true” matches can be for Firefox® userswho have fast internet connections and in which the browsing timestamp1150 is after the content timestamp 1158. If the match is determined tobe “false” or not a match, the corresponding interval D between the twotimestamps has been classified as coming from a uniform distribution inwhich the probability for each sample is equal to1/(maximum_interval−minimum_interval). For example, if intervals betweenten minutes before or after an event were considered, the correspondinguniform distribution would be confined to (0, 600] seconds, and D wouldhave a constant density with height 1/600 on (0,600] seconds for a“false” match. If the match were true, D has been determined to have anon-uniform distribution having a peak at zero and a long right tail.Both gamma and lognormal distributions have these properties and have noupper bound. In practice, these distributions give little probabilityfar out in the tail so that they may be considered to be effectivelybounded.

By using the system 1100, it has been shown that the question of whetheror not a user (e.g., user 1106 b) was accessing a web page while thecontent 1112 was displayed on the web page is equivalent to answeringwhether a given sample of an interval D is more likely to be a samplefrom a uniform distribution (which corresponds to a “false” match) or anon-uniform distribution (which corresponds to a “true” match), or if itis uncertain whether the interval D is a sample from a uniform or anon-uniform distribution.

As illustrated in equation 1, the EM algorithm has a Bayesianinterpretation. Each single observation, interval D, has a priorprobability, or P(match), of being a “true” match. In the end, a largerange (e.g., 0.1-0.9) of prior probabilities was tried, and the choiceof prior moved the determined threshold by no more than 0.1 seconds.) Ifthe interval corresponds to a “true” match, then D has a non-uniform(e.g., lognormal, gamma) density; otherwise, the interval D has auniform density.

If the parameters (e.g., the mean and standard deviation) of thenon-uniform were known, then after observing D, the posteriorprobability of a “true” match is computed using Bayes theorem as givenin equation 1. As the parameters are unknown, they can be estimatediteratively. Given the current estimates of P(match|D), each observationis weighted by the probability that it is a “true” match and a weightedmean and standard deviation of the log distances log(D) are computed.These would be the maximum likelihood estimates of (μ, σ) if P(match|D)were correct. Next, if needed, the collector-analyzer 1108 canrecalculate P(match|D) and re-estimate the lognormal parameters,stopping when the posterior probabilities are no longer changing by anamount to considerably alter the estimation.

After the algorithm has converged, each observation or interval D hasits own P(match|D). If the probability is nearly one, then it isreasonable to assume that the user has accessed a web page while thecontent 1112 was displayed on the web page. If the probability is nearlyzero, then it is reasonable to assume that it is a “false” match andthat the user has not accessed a web page while the content 1112 wasdisplayed on the web page. Otherwise, the true/false classification ofthe match is uncertain, and it cannot be safely assumed whether or notthe user accessed a web page while the content 1112 was displayed on theweb page.

Example Data Analysis

The procedure described above was applied separately to variouscombinations of Firefox® and MSIE browser types crossed with fast, slow,and unknown speeds, treating positive and negative intervals separately,and to data representing unknown browser type ignoring speed.

FIG. 14 displays example histograms of intervals calculated by thecollector-analyzer 108 for merged data logs 1126 formed by combiningsample content logs 1118 with aggregate browsing logs 1116. IDs 1154 and1168 were identified that correspond to users who possibly accessed awebsite within a ten minute interval of the timestamp 1158, which iswhen the content 1112 was displayed. While a ten minute window is toolarge to contain only “true” matches, having a long window helps tounderstand how long the window should be in order to contain mostly“true” matches.

In the example shown in FIG. 14, there were over 300,000 potential“true” matches for the merged data logs 1126. These potential matchescan be subdivided into different groups based on the determined webbrowser type and internet connection speed. For example, membership inthese groups can be expressed as a percentage of the total number ofpotential matches. Of the over 300,000 potential “true” matches, about72% used Firefox®, about 28% used MSIE, and the web browser 1122 wasunidentified for about 0.09%. About 5% of both the MSIE and Firefox®browser users were determined to have a slow connection. Only about 9%of the MSIE users had an unknown internet connection speed, while about25% of the Firefox® users had an unknown internet connection speed.Histograms in FIG. 14 are grouped according to the determined browsertype (e.g., browser type 1502) and the determined internet connectionspeed (e.g., internet connection speed 1504). Each histogram shows thepositive and negative intervals between the timestamp 1158 of when thecontent 1112 was displayed and the timestamp 1150 of when the closestbrowsing event was recorded in the browsing log 1124. The abscissa ofeach histogram is in units of seconds.

The tails of the histograms in FIG. 14 are relatively flat and, exceptfor users of MSIE over a slow connection, pass a test for uniformity.Also, the peaks of the windows, where we would expect most of thematches to lie, lie near zero and are much higher than the tails,suggesting that it should be possible to identify some instances ofusers who were presented with content with high confidence. Even at thisscale, which shows only the coarsest features, it is clear that MSIEusers with slow connections have larger intervals than other users. Thehistogram for the MSIE users with unknown connect speeds look like amixture of the MSIE fast and slow histograms; the histogram for theFirefox® users with an unknown connect speed has a smaller peak.

FIG. 15 focuses on the fast and slow, MSIE and Firefox® histograms ofFIG. 14 (i.e., the two middle and two leftmost histograms) and plots theabsolute intervals on a log scale. All intervals plotted in FIG. 15 havebeen determined to occur when the browsing timestamp 1150 is before thecontent timestamp 158. These plots, which use the same identifiers ofbrowser type 1502 and internet connection speed 1504, focus on shortintervals between the first time (e.g., the browsing timestamp 1150) andthe second time (e.g., the content timestamp 1158), which are the onesmost likely to correspond to “true” matches rather than “false” matches.On this scale, in this example, the tails increase exponentially forfast MSIE connections and all Firefox® connections, except in the verylast bin.

A preliminary analysis suggested that browsing timestamps 1150 thatoccur within 10 seconds of a content timestamp 1158 should be considered“true” matches, and greater intervals should be considered coincidences.The vertical lines 1606 a-d in the histograms shown in FIG. 15 are drawnat 10 seconds, or about 2.2 on the log scale in FIG. 15. This seems likea reasonable boundary between “true” and “false” matches for MSIE userson a fast connection, but not for Firefox® users. There is no peak inthe intervals for Firefox® users, suggesting that none of the matchesfor Firefox® can safely be assumed to be true when the browsingtimestamp 1150 precedes the content timestamp 1158. Only 2% of allFirefox® users would be affected by this cutoff in this study, butremoving them from the “true” match group seems likely to reduce thefalse positive rate by 0.02.

FIG. 16 presents the same types of histograms as shown in FIG. 15 exceptthat all intervals plotted in FIG. 16 have been determined to occur whenthe browsing timestamp 1150 is after the content timestamp 1158. Theopposite conclusion is reached when the content timestamp 1158 is first:the 10 second cutoff is reasonable for Firefox® users but seems toinclude mainly “false” matches for MSIE users; 3% of MSIE users in thisexample have positive intervals within the ten second window. Overall,2.4% of the users in this study appear to be obvious false positives (oridentified as a “true” match when they are in fact a “false” match) witha two-sided, ten-second window.

Referring to FIG. 17, portions of four histograms shown in FIG. 14 arepresented: a zoomed view of the negative intervals (i.e., the browsingtimestamp 1150 occurs before the content timestamp 1158) for MSIE andthe positive intervals (i.e., the browsing timestamp 1150 occurs afterthe content timestamp 1158) for Firefox®. Vertical lines 1806 a-d aredrawn and labeled for the upper left figure at specific intervals: line1806 a is at 1 minute, line 1806 b, at 2 minutes, line 1806 c, at 4minutes, and line 1806 d, at 5 minutes. The respective vertical lines inthe other histograms are placed at intervals that correspond to lines1806 a-d.

Standard algorithms (e.g., an expectation-maximization or “EM”algorithm) can be used to distinguish from which of two mixtures (i.e.,a uniform distribution and a non-uniform distribution) an observation isdrawn. For example, an EM algorithm works well on problems like this onein which one of the mixture components is known (i.e., the “false” matchis drawn from a uniform distribution) and the two distributions havevery dissimilar shapes.

Referring to FIG. 18, a table 1900 lists example combinations of browsertype, internet connection speed, sequence of times (e.g., whether thefirst time occurred before or after the second time, the first timebeing the browsing timestamp 1150 and the second time being the contenttimestamp 1158), and a threshold that was determined or calculated forthese conditions (e.g., as described in the process 1400). Examplebrowser types in table 1900 include Firefox®, Microsoft® InternetExplorer, and unknown. Example internet connection speeds include fast,slow, and unknown.

The thresholds for most conditions were stable across choice of priorsprobabilities (e.g., between 0.1 and 0.9), and a lognormal fit isreasonable, except in the right tail where it has only a few samples.This may lead to a higher percentage of false negatives for largerdistances, but because there is only a small number of samples in thetail of the lognormal, the error is not significant. Moreover, theregion of uncertainty in the “nice” conditions (e.g., positive distancesfor Firefox® internet browsers, negative distances for MSIE internetbrowsers) is small, never more than 0.2 seconds, and so is not asignificant source of error. This means that each potential “true” matchhas a posterior probability of being a “true” match that is eithernearly one or nearly zero. The nice conditions and the thresholdsseparating true and false matches are shown in table 1900. The unknownbrowser type is included in the nice category because even though thereis a relatively large uncertain region (mostly due to a lack of data inthis region) the lognormal seems to fit the data likely to be exposureswell.

As shown in table 1900, for a set of intervals from Firefox® internetbrowsers, it was determined that, regardless of internet connectionspeed, if the first time (e.g., the browsing timestamp 1150) occursafter the second time (e.g., the content timestamp 1158) (i.e., thefirst time is greater than the second time), no threshold was found toseparate adequately the “true” matches from the “false” matches.Similarly, for a set of intervals from Microsoft® Internet Explorerinternet browsers, it was determined that, regardless of internetconnection speed, if the first time occurs before the second time (i.e.,the first time is less than the second time), no threshold was found toseparate adequately the “true” matches from the “false” matches.Additional analyses of more intervals in these categories may improvefuture threshold determination.

Also shown in table 1900, for a set of intervals in which the first time(e.g., the browsing timestamp 1150) occurs before the second time (e.g.,the content timestamp 1158) (i.e., the first time is less than thesecond time), regardless of the type of internet browser used, aninterval that is below a threshold of 6.3 seconds was classified as a“true” match. In addition, intervals between 6.3 seconds and 11.5seconds were classified as “uncertain.” Similarly, for a set ofintervals in which the first time occurs after the second time (i.e.,the first time is greater than the second time), regardless of the typeof internet browser used, an interval that is below a threshold of 7.7seconds was classified as a “true” match. In addition, intervals between7.7 seconds and 9.9 seconds were classified as “uncertain.” Additionalanalysis of more intervals from these categories may improve futurecategorization within the “uncertain” zone.

FIG. 19 is a schematic representation of a general computing system 2000that can be used to implement the advertisement management system 104,search engine 112, collector-analyzer 1108, page link analysis server1102, content server 1101, or their components. Computing device 2000 isintended to represent various forms of digital computers, such aslaptops, desktops, workstations, personal digital assistants, servers,blade servers, mainframes, and other appropriate computers. Thecomponents shown here, their connections and relationships, and theirfunctions, are meant to be exemplary only, and are not meant to limitimplementations of the inventions described or claimed in this document.

Referring to FIG. 19, the computing device 2000 includes a processor2002, memory 2004, a storage device 2006, a high-speed interface 2008connecting to memory 2004 and high-speed expansion ports 2010, and a lowspeed interface 2012 connecting to low speed bus 2014 and storage device2006. Each of the components 2002, 2004, 2006, 2008, 2010, and 2012, areinterconnected using various busses, and may be mounted on a commonmotherboard or in other manners as appropriate. The processor 2002 canprocess instructions for execution within the computing device 2000,including instructions stored in the memory 2004 or on the storagedevice 2006 to display graphical information for a GUI on an externalinput/output device, such as display 2016 coupled to high speedinterface 2008. In other implementations, multiple processors and/ormultiple buses may be used, as appropriate, along with multiple memoriesand types of memory. Also, multiple computing devices 2000 may beconnected, with each device providing portions of the necessaryoperations (e.g., as a server bank, a group of blade servers, or amulti-processor system).

The memory 2004 stores information within the computing device 1000. Inone implementation, the memory 2004 is a volatile memory unit or units.In another implementation, the memory 2004 is a non-volatile memory unitor units. The memory 2004 may also be another form of computer-readablemedium, such as a magnetic or optical disk.

The storage device 2006 is capable of providing mass storage for thecomputing device 2000. In one implementation, the storage device 2006may be or contain a computer-readable medium, such as a floppy diskdevice, a hard disk device, an optical disk device, or a tape device, aflash memory or other similar solid state memory device, or an array ofdevices, including devices in a storage area network or otherconfigurations. A computer program product can be tangibly embodied inan information carrier. The computer program product may also containinstructions that, when executed, perform one or more methods, such asthose described above. The information carrier is a computer- ormachine-readable medium, such as the memory 2004, the storage device2006, memory on processor 2002, or a propagated signal.

The high speed controller 2008 manages bandwidth-intensive operationsfor the computing device 2000, while the low speed controller 2012manages lower bandwidth-intensive operations. Such allocation offunctions is exemplary only. In one implementation, the high-speedcontroller 2008 is coupled to memory 2004, display 2016 (e.g., through agraphics processor or accelerator), and to high-speed expansion ports2010, which may accept various expansion cards (not shown). In theimplementation, low-speed controller 2012 is coupled to storage device2006 and low-speed expansion port 2014. The low-speed expansion port,which may include various communication ports (e.g., USB, Bluetooth,Ethernet, wireless Ethernet) may be coupled to one or more input/outputdevices, such as a keyboard, a pointing device, a scanner, or anetworking device such as a switch or router, e.g., through a networkadapter.

The computing device 2000 may be implemented in a number of differentforms, as shown in the figure. For example, it may be implemented as astandard server 2020, or multiple times in a group of such servers. Itmay also be implemented as part of a rack server system 2024. Inaddition, it may be implemented in a personal computer such as a laptopcomputer 2022. Each of such devices (e.g., standard server, rack serversystem, personal computer, laptop computer) may contain one or more ofcomputing device 2000, and an entire system may be made up of multiplecomputing devices 2000 communicating with each other.

Various implementations of the systems and techniques described here canbe realized in digital electronic circuitry, integrated circuitry,specially designed ASICs (application specific integrated circuits),computer hardware, firmware, software, and/or combinations thereof.These various implementations can include implementation in one or morecomputer programs that are executable and/or interpretable on aprogrammable system including at least one programmable processor, whichmay be special or general purpose, coupled to receive data andinstructions from, and to transmit data and instructions to, a storagesystem, at least one input device, and at least one output device.

These computer programs (also known as programs, software, softwareapplications or code) include machine instructions for a programmableprocessor, and can be implemented in a high-level procedural and/orobject-oriented programming language, and/or in assembly/machinelanguage. As used herein, the terms “machine-readable medium”“computer-readable medium” refers to any computer program product,apparatus and/or device (e.g., magnetic discs, optical disks, memory,Programmable Logic Devices (PLDs)) used to provide machine instructionsand/or data to a programmable processor, including a machine-readablemedium that receives machine instructions as a machine-readable signal.The term “machine-readable signal” refers to any signal used to providemachine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniquesdescribed here can be implemented on a computer having a display device(e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor)for displaying information to the user and a keyboard and a pointingdevice (e.g., a mouse, trackball, touch-sensitive screen, or iDrive-likecomponent) by which the user can provide input to the computer. Otherkinds of devices can be used to provide for interaction with a user aswell; for example, feedback provided to the user can be any form ofsensory feedback (e.g., visual feedback, auditory feedback, or tactilefeedback); and input from the user can be received in any form,including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in acomputing system that includes a back end component (e.g., as a dataserver), or that includes a middleware component (e.g., an applicationserver), or that includes a front end component (e.g., a client computerhaving a graphical user interface or a Web browser through which a usercan interact with an implementation of the systems and techniquesdescribed here), or any combination of such back end, middleware, orfront end components. The components of the system can be interconnectedby any form or medium of digital data communication (e.g., acommunication network). Examples of communication networks include alocal area network (“LAN”), a wide area network (“WAN”), and theInternet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

A number of implementations and examples have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention. Forexample, various forms of the flows shown above may be used, with stepsre-ordered, added, or removed. Also, although several applications andmethods have been described, it should be recognized that numerous otherapplications are contemplated. For example, while this specificationcontains many specific implementation details, these should not beconstrued as limitations on the scope of any invention or of what may beclaimed, but rather as descriptions of features that may be specific toparticular implementations of particular inventions. Certain featuresthat are described in this specification in the context of separateexamples can also be implemented in combination in a single example.Conversely, various features that are described in the context of asingle example can also be implemented in multiple examples separatelyor in any suitable subcombination. Moreover, although features may bedescribed above as acting in certain combinations and even initiallyclaimed as such, one or more features from a claimed combination can insome cases be excised from the combination, and the claimed combinationmay be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various systemcomponents in the examples described above should not be understood asrequiring such separation in all examples, and it should be understoodthat the described program components and systems can generally beintegrated together in a single software product or packaged intomultiple software products.

In addition, the logic flows depicted in the figures do not require theparticular order shown, or sequential order, to achieve desirableresults. Other steps may be provided, or steps may be eliminated, fromthe described flows, and other components may be added to, or removedfrom, the described systems.

The techniques for matching timestamps described above can be used injoining of any types of two or more logs, in which the logs can bedifferent than those described above. Accordingly, other implementationsare within the scope of the following claims.

1. An apparatus comprising: one or more server computers configured to perform operations comprising: providing a plurality of gadgets for processing data for display on a user device, each gadget being instructions for embedding in a web page, and each gadget being operable to perform operations comprising: collecting, from a reporting process that receives information from a plurality of sources, information about an advertisement as the information is updated by the reporting process during an ad campaign, the collecting comprising: accessing advertisement data logs from a first source for the advertisement, the advertisement data logs specifying timestamps of presentation events for the advertisement on web pages and Internet protocol (IP) addresses associated with the presentation events; accessing browsing data logs from a second source for the web pages, the browsing data logs specifying timestamps of web page view events for the web pages, IP addresses associated with the web page view events, and a user ID that is not included in the advertisement data logs; processing the collected information, including determining one or more performance metrics of the advertisement, the processing comprising: for pairs of an advertisement data log entry and a browsing data log entry that have matching IP addresses, determining a time interval for the pair based at least in part on a difference between the timestamp in the advertisement data log entry of the pair and the timestamp in the browsing data log entry of the pair; for each of at least some of the pairs, determining that the advertisement was presented to a user corresponding to the user ID on a respective one of the web pages in response to determining that the respective time interval satisfies a predetermined threshold corresponding to a maximum time interval associated with presentation of the advertisement on the respective one of the web pages; in response to determining that the advertisement was presented to the user on the respective one of the web pages, associating the user ID from the browsing data log entry with the presentation events for the advertisement; and outputting a visual representation of the processed information as the information is updated by the reporting process; and providing a dashboard for display on the user device, the dashboard operable to present the visual representations output from the gadgets in an integrated user interface.
 2. The apparatus of claim 1 in which two or more of the gadgets are operable to output visual representations of information on effectiveness of corresponding two or more advertisements.
 3. The apparatus of claim 2 in which the dashboard shows the two or more advertisements sorted according to the effectiveness of the advertisements.
 4. The apparatus of claim 1 in which at least one of the gadgets is operable to process private data at a client site and implements a security procedure to prevent unauthorized access to the private data. 5-6. (canceled)
 7. The apparatus of claim 1 in which the information comprises at least one of data indicating a performance of the advertisement, statistical data associated with the advertisement, or data indicating recognition of a brand associated with the advertisement.
 8. The apparatus of claim 1 in which the one or more performance metrics comprise at least one of brand health, campaign effectiveness, competitive brand tracking, market research, offline ad effectiveness, or mix media recommendation.
 9. The apparatus of claim 1 in which at least two gadgets communicate with each other such that a gadget is updated automatically in response to a change in another gadget.
 10. The apparatus of claim 1 in which the gadgets comprise a first gadget that processes data output from a second gadget and a third gadget to generate combined data for output.
 11. The apparatus of claim 10 in which the first gadget correlates the output from the second gadget with the output from the third gadget to identify a correlation between the outputs from the second and third gadgets.
 12. The apparatus of claim 1, further comprising one or more application programming interfaces (APIs) for exchanging data among the gadgets.
 13. The apparatus of claim 1, further comprising one or more application programming interfaces (APIs) for exporting data from the gadgets or importing data to the gadgets.
 14. The apparatus of claim 1 in which the dashboard presents the output from various gadgets in the web page.
 15. The apparatus of claim 1 in which the integrated user interface displays at least one of text messages, charts, or graphs.
 16. A computer-implemented method for evaluating advertisements, comprising: providing, by at least one server computer, a plurality of gadgets processing data for display on a user device, each gadget being instructions for embedding in a web page, and each gadget being operable to perform operations comprising: collecting, from a reporting process that receives information from a plurality of sources, information about an advertisement as the information is updated by the reporting process during an ad campaign, the collecting comprising: accessing advertisement data logs from a first source for the advertisement, the advertisement data log specifying timestamps of presentation events for the advertisement on web pages and Internet protocol (IP) addresses associated with the presentation events; accessing browsing data logs from a second source for the web pages, the browsing data logs specifying timestamps of web page view events for the web pages, IP addresses associated with the web page view events, and a user ID that is not included in the advertisement data logs; processing the collected information, including determining one or more performance metrics of the advertisement, the processing comprising: for pairs of an advertisement data log entry and a browsing data log entry that have IP addresses that match, determining a time interval for the pair based at least in part on a difference between the timestamp in the advertisement data log entry of the pair and the timestamp in the browsing data log entry of the pair; for each of at least some of the pairs, determining that the advertisement was presented to a user corresponding to the user ID on a respective one of the web pages in response to determining the respective time interval satisfies a predetermined threshold corresponding to a maximum time interval associated with presentation of the advertisement on the respective one of the web pages; and in response to determining that the advertisement was presented to the user on the respective one of the web pages, associating the user ID with the presentation events for the advertisement; and outputting a visual representation of the processed information as the information is updated by the reporting process; and providing a dashboard for display on the user device, the dashboard operable to present the visual representations output from the gadgets in an integrated user interface.
 17. A non-transitory storage device storing instructions operable to cause one or more processors to perform operations comprising: providing, from one or more server computers, a plurality of gadgets for processing data for display on a user device, each gadget being instructions for embedding in a web page, and each gadget being operable to perform operations comprising: collecting, from a reporting process that receives information from a plurality of sources, information about an advertisement as the information is updated by the reporting process during an ad campaign, the collecting comprising: accessing advertisement data logs from a first source for the advertisement, the advertisement data logs specifying timestamps of presentation events for the advertisement on web pages and Internet protocol (IP) addresses associated with the presentation events; accessing browsing data logs from a second source for the web pages, the browsing data logs specifying timestamps of web page view events for the web pages, IP addresses associated with the web page view events, and a user ID that is not included in the advertisement data logs; processing the collected information, including determining one or more performance metrics of the advertisement, the processing comprising: for pairs of an advertisement data log entry and a browsing data log entry that have IP addresses that match, determining a time interval for the pair based at least in part on a difference between the timestamp in the advertisement data log entry of the pair and the timestamp in the browsing data log entry of the pair; for each of at least some of the pairs, determining that the advertisement was presented to a user corresponding to the user ID on a respective one of the web pages in response to determining the respective time interval satisfies a predetermined threshold corresponding to a maximum time interval associated with presentation of the advertisement on the respective one of the web pages; in response to determining that the advertisement was presented to the user on the respective one of the web pages, associating the user ID with the presentation events for the advertisement; and outputting a visual representation of the processed information as the information is updated by the reporting process; and providing a dashboard for display on the user device, the dashboard operable to present the visual representations output from the gadgets in an integrated user interface.
 18. The device of claim 17, wherein the one or more performance metrics comprise at least one of brand health, campaign effectiveness, competitive brand tracking, market research, offline ad effectiveness, or mix media recommendation.
 19. The device of claim 17, wherein at least two gadgets communicate with each other such that a gadget is updated automatically in response to a change in another gadget. 20-40. (canceled)
 41. The method of claim 16, comprising: providing supporting layers for the gadgets, the supporting layers comprising: a first layer configured to obtain, from at least two data sources, raw data indicating effectiveness of the advertisement; and a second layer configured to analyze data output of the first layer, including determining trends of the data output from the first layer; and an application programming interface (API) for the gadgets to access the analyzed data to generate the performance metrics, wherein the performance metrics include the trends of the data.
 42. The method of claim 41, wherein analyzing the data output of the first layer comprises: combining raw data from each of the at least two data sources; and measuring effectiveness of the advertisement based on the combined raw data.
 43. (canceled)
 44. The method of claim 16, wherein processing the collected information comprises merging advertisement data log entries and browsing data log entries based on the respective IP addresses and the respective timestamps in advertisement data log entries and browsing data log entries to generate pairs of advertisement data log entries and browsing data log entries.
 45. The method of claim 44, wherein measuring effectiveness of the advertisement comprises determining a web browsing history of a user, the web browsing history indicating whether the advertisement is presented to the user.
 46. (canceled)
 47. The method of claim 16, wherein each gadget is a user-uploaded application program.
 48. The method of claim 16, further comprising: for each of the pairs, selecting the predetermined threshold based at least in part on a browser type and internet connection speed for a user device with the matching IP address of the pair; and wherein determining that the advertisement is presented to the user comprises determining that the advertisement is presented to the user on the respective one of the web pages in response to determining the respective time interval satisfies the selected predetermined threshold. 